Modeling Annotator Accuracies for Supervised Learning

نویسندگان

  • Abhimanu Kumar
  • Matthew Lease
چکیده

Crowdsourcing [5] methods are quickly changing the landscape for the quantity, quality, and type of labeled data available to supervised learning. While such data can now be obtained more quickly and cheaply than ever before, the generated labels also tend to be far noisier due to limitations of current quality control mechanisms and processes. Given such noisy labels and a supervised learner, an important question to consider, therefore, is how labeling effort can be optimally utilized in order to maximize learner accuracy? For example, should we (a) label additional unlabeled examples, or (b) generate additional labels for labeled examples in order to reduce potential label noise [12]? In comparison to prior work, we show faster learning can be achieved for case (b) by incorporating knowledge of worker accuracies into consensus labeling [13]. Evaluation on four binary classification tasks with simulated annotators shows the empirical importance of modeling annotator accuracies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration, multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances groundtruth may not exist). Semi-supervised learning approaches have sh...

متن کامل

Modeling annotator expertise: Learning when everybody knows a bit of something

Supervised learning from multiple labeling sources is an increasingly important problem in machine learning and data mining. This paper develops a probabilistic approach to this problem when annotators may be unreliable (labels are noisy), but also their expertise varies depending on the data they observe (annotators may have knowledge about different parts of the input space). That is, an anno...

متن کامل

Modeling Annotator Rationales with Application to Pneumonia Classification

We present a technique to leverage annotator rationale annotations for ventilator assisted pneumonia (VAP) classification. Given an annotated training corpus of 1344 narrative chest X-ray reports, we report results for two supervised classification tasks: Critical Pulmonary Infection Score (CPIS) and the likelihood of Pneumonia (PNA). For both tasks, our training data contain annotator rational...

متن کامل

Active Learning from Multiple Knowledge Sources

Some supervised learning tasks do not fit the usual single annotator scenario. In these problems, ground-truth may not exist and multiple annotators are generally available. A few approaches have been proposed to address this learning problem. In this setting active learning (AL), the problem of optimally selecting unlabeled samples for labeling, offers new challenges and has received little at...

متن کامل

Active Learning from Crowds

Obtaining labels can be expensive or timeconsuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled instances to be labeled by an oracle. The general problem of optimally choosing these instances is known as active learning. As it is usually set in the context of su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011